A novel approach for audible acoustic quick response codes

Compared to image-based quick response (QR) codes, acoustic QR codes have some advantages. However, an acoustic QR scanner cannot recognize an acoustic QR code at a distance of more than two meters from an acoustic QR announcer. To this end, we propose a new sort of acoustic QR code, called an audible acoustic QR code (AAQRC), which employs humanly audible sound to carry users’ information directly. First, a user’s string of characters is translated into a string of pitches. Then, the related algorithms convert the string of pitches into a playable audio file. As a result, an AAQRC is generated, consisting of the audio itself. AAQRC recognition is the opposite process of AAQRC generation. Compared with the existing approach for acoustic QR codes, the new method can recognize acoustic QR codes at a longer distance, even if there are obstacles between the AAQRC announcer and AAQRC scanner.

Background MIDI file 14 . The musical instrument digital interface (MIDI) was proposed to address the communication problem between electronic-acoustic instruments. As the most widely used musical standard format, a MIDI is regarded as "a music score understood by a computer". To date, the MIDI has become one of the standard languages used by electronic musical instruments and computers, and an agreement about the set of messages (i.e., instructions). A MIDI itself generates no sound signal. However, it records each musical note as a number and transmits various messages about these numbers in a cable. The electronic-acoustic equipment receiving the message generates sound or performs some actions, according to the message.
Basically, a MIDI file consists of two parts: a block about the file's header and a block about the audio tracks. The former block includes (1) a subblock identifying the type of file (4 bytes); (2) a subblock indicating the length of the next subblock called the data area of the current block (4 bytes); and (3) a subblock called the data area of the current block (6 bytes).
At the beginning of each MIDI file, the file's header block has the following hexadecimal string of numbers: "4d 54 68 64 00 00 00 06 ss ss nn nn tt tt". In this string, "4d 54 68 64" is the substring identifying the type of file, and it indicates that this file is a MIDI file. The value of the subsequent substring is "00 00 00 06" because the next subblock, called the data area of the current block, always has six bytes.
The meaning of the first two bytes in substring "ss ss nn nn tt tt" is as follows: "00 00" means that there is only one track; "00 01" means that there are multiple synchronous tracks; "00 10" means that there are multiple independent tracks. In addition, the substring "nn nn" specifies the number of tracks, while the substring "tt tt" specifies the time format and the highest bit is a label. If the value of this bit is 0, tick timing is used. Otherwise, the SMPTE format is employed for timing.
There are one or more blocks about the audio tracks, posterior to the block about the file's header, in a MIDI file. Each audio track block includes three parts: (1) a subblock identifying the type of track (4 bytes) and track block data area length (4 bytes); (2) a subblock indicating the length of the next subblock called the data area of the current block (4 bytes); and (3) a subblock called the data area of the current block (consisting of multiple MIDI events).
The first subblock is "4d 54 72 6b" in hexadecimal. A MIDI event contains dynamic bytes and MIDI messages. MIDI messages may be channel messages or system messages. Channel messages play a key role in recording music scores. Its main functions include releasing musical notes, pressing musical notes, touching musical notes, changing a controller, changing an instrument, changing a pitch wheel, setting the sequence in a track, event on texts, notice on copyright, designating the name of a song/track, designating the musical instrument, lyrics and notes, termination of track, specifying speed, specifying beat, and so on. For example, let a piano be used; a pitch called C4 will be recorded if one presses C4 at one time and releases this button at the next time.
In this way, a MIDI file records a music score understood by a computer. Ref. 14 provides more details on the MIDI format, helping us understand the principle of translating a string of pitches into a MIDI file and the reverse procedure.
Measuring pitches using an algorithm. In short, the key principle of this sort of algorithm are as follows.
First, an acoustic sensor is employed to feel the vibrations caused by a pitch. On this basis, the acoustic sensor can measure how much time (T) a vibration requires. Second, let f = 1/T, and f is the frequency of the vibrations.

The principle of audible acoustic QR codes
The principle and the algorithms. In brief, we employ humanly audible audio to directly encode user information in a QR code. The principle of the new approach is as follows. First, a one-to-one map between a set of frequently used characters and a set of frequently used pitches is constructed. Thus, a string of characters is translated to a string of pitches, and the latter string is employed to express a URL. As a result, an AAQRC will be generated if a piece of music (such as a piano piece) is generated, whereas this AAQRC will be recognized if this piece of music is played. The new method has four steps, as shown in Fig. 1 and algorithms 1 and 2.
It should be noted that AAQRC recognition has two optional modes/ways: recognizing a file (Mode 1) and playing and listening (Mode 2). The difference is that the MIDI file itself will be recognized with the former mode, while the sound being heard in the air will be recognized with the latter mode.
With Mode 1, step 6 calls an algorithm to translate a MIDI file into a string of pitches, using the procedure mentioned at the end of the "MIDI file" subsection. With Mode 2, step 6 calls an algorithm to translate a series of acoustic signals into a string of pitches, using the procedure mentioned in the "Measuring pitches using an algorithm" subsection.
Time complexity. Let length(x) = n. Step 1 completes its computational task within O(1) time, as does step 3. If there are m rows in the one-to-one map between the characters and the pitches (m different characters and m different pitches are used), seeking a given character or pitch will take O(m) time, so step 2 will consume O(m) time. In addition, step 4 will take O(n) time, according to the principle of the MIDI file mentioned in the previous section. Let length(string_pitches) = n.
Step 5 can complete its computational task within O(1) time, as can steps 8 and 9. If there are m rows the one-to-one map between the characters and the pitches (m different characters and m different pitches are used), seeking a given character or pitch will take O(m) time, so step 7 will consume O(m) time. In addition, step 6 will take O(n) time, according to the principles of procedures mentioned in the previous section, regardless of whether it is Mode 1 or Mode 2. In other words, the proposed algorithms have polynomial complexities, and they can complete their computational tasks in polynomial time.

A case study
Let us take the official website of Zhengzhou University ("www. zzu. edu. cn") as an example to test the process of AAQRC generation and recognition. Table 1 shows the platform and tools used in our experiments.
Then, Fig. 3 illustrates a music score of f2 with Overture 5 2 . Using Overture, we generate a playable MIDI file called "testzzu_h.mid" according to the music score of f2. This MIDI file itself is the produced AAQRC for the homepage of Zhengzhou University.
The process of AAQRC recognition from this audio is as follows. First, the string of pitches f3 is read directly from "testzzu_h.mid" using MidiEditor 3 (Mode 1 is used). As shown in Fig. 4, f3 = "F7 F7 F7 C4 B7 B7 D7 C4 B4 A4 D7 C4 G4 D6". Clearly, f2 = f3 holds. www.nature.com/scientificreports/ Finally, as shown in Fig. 5, the value of f3 is inputted, and step 4 in the new method translates f3 into the corresponding string of characters, i.e., f4 = "www. zzu. edu. cn". Clearly, f1 = f4 holds, indicating that the recognized URL equals the intended URL. It is clear that AAQRC generation and recognition are successful, in this example.
It should be noted that a QR announcer can also show its AAQRC by playing "testzzu_h.mid", whereas a QR scanner can recognize this AAQRC by listening to this audio (Mode 2 is used). We employ the loudspeaker listed in Table 1 to play the audio at a normal volume and use the pickup listed in Table 1 to pick up the sound. The distance between the pickup and the loudspeaker is set to 3 m, and these two devices are separated by a baffle. An online tool called Bideyuanli 4 is employed to convert the sound collected by the pickup into a string of pitches f3' . As shown in Fig. 6, f3' = " F7 F7 F7 C4 B7 B7 D7 C4 B4 A4 D7 C4 G4 D6". Clearly, f3 = f3' holds, indicating that all pitches are correctly identified.

Experiments
Experimental objective. We aim to explore whether the new method is effective. To be specific, can an AAQRC scanner effectively recognize the URL information sent by an AAQRC announcer at a distance? Experimental platform. Please see Table 1. This table depicts the experimental platform used in this study.
It should be noted that all the acoustical equipment was selected randomly, without any special consideration.  www.nature.com/scientificreports/ Experimental procedure.
Step (1). Thirty different URLs are selected randomly, where each of the ten URLs contains ten characters, and each of another ten URLs contains twenty characters, and each of the other ten URLs contains thirty characters.
Step (2). For each of the thirty URLs, we produce the corresponding string of pitches using Overture according to a given relationship between characters and pitches. On this basis, thirty MIDI files are generated.
Step (3). Each of the thirty MIDI files is played on a machine with a loudspeaker, and another machine with a pickup receives the acoustic signals and tries to recognize them at a distance. In other words, Mode 2 is employed since AAQRC recognition in Mode 1 is easier.
Step (4). For each of the thirty MIDI files, the recognized acoustic signals are translated to the corresponding strings of pitches using Bideyuanli.
Step (5). For each of the thirty obtained strings of pitches, the recognized string of characters is obtained according to the given relationship between characters and pitches.
Experimental results and some discussions. In our experiments, the second columns of Tables 2, 3 and 4 depict the thirty produced URLs, and the third columns of Tables 2, 3 and 4 illustrate the thirty corresponding strings of pitches. The given relationship between characters and pitches is given in Table 5. Furthermore, the thirty generated MIDI files are shown in the fourth columns of Tables 2, 3 and 4. The thirty music scores of these MIDI files are illustrated in Fig. 7.
There are two questions worth study. The first concerns the distance between the two machines. The other is about obstacles such as a baffle or something else between the two machines.
To this end, we set up four different scenarios, as shown in Table 6. The difference between the four scenes lies in the distance between the loudspeaker and the pickup and whether there are obstacles between them. The key   www.nature.com/scientificreports/ point is that the decibels measured at the pickup remain unchanged (at least 30 decibels higher than background noise). As shown in Table 6, the results indicate that all thirty AAQRCs are correctly recognized. Now, the decibels measured at the loudspeaker remain unchanged, and the decibels measured at the pickup change. Let us see what happens. This time, the results are somewhat different, as depicted in Table 7. Figure 8 summarizes the results of Tables 6 and 7. The relative sound volume is defined as the sound volume at the pickup minus the volume of background noise. If the relative sound volume at the pickup is not less than 30 decibels, all strings of pitches can be correctly and completely identified. This conclusion has nothing to do with the following factors: the length of the string of pitches, the distance between the pickup and the loudspeaker, and whether there are obstacles between the pickup and the loudspeaker. In contrast, if the relative sound volume at the pickup is lower than 30 decibels, the accuracy of recognition of strings of pitches will decrease sharply with decreasing decibels. In other words, the relative sound volume is the only factor affecting the accuracy of recognition. The process of recognition will not be contaminated or affected by environmental noise or obstacles if the difference between the sound volume at the pickup and that of noise is not lower than 30 decibels.
Furthermore, considering that acoustic scene classification (ASC) 19 is important to reduce noise, we can use it to try and make an AAQRC work in the background of larger noise, without a greater sound volume of AAQRC playback.

Comparisons between this work and related ones
Comparison with other acoustic-based approaches. Some great works have been conducted in the field of QR codes related to acoustics.
An approach called acoustic QR codes and differing from the new approach was presented in 1 . Table 8 provides some differences between the two methods.
The information in acoustic QR codes is difficult to correctly identify when the distance between the loudspeaker and the pickup reaches 2 m 1 . In contrast, an AAQRC scanner (with a pickup) can correctly identify an URL sent by an AAQRC announcer from 10 m away. According to the above experimental results, we have a reason to believe that the new method can still achieve this even if the distance is larger, as long as the relative sound volume stays at 30 decibels or more.  Table 5. The relationship between characters and pitches.

Character Pitch Character Pitch Character Pitch Character
Pitch www.nature.com/scientificreports/ In addition, Ref. 1 does not report whether the existing method based on acoustic QR codes works if there is an obstacle between the announcer and the scanner. In contrast, an AAQRC scanner (with a pickup) can correctly identify a URL sent by an AAQRC announcer, even if there are two obstacles between the announcer and the scanner. According to the above experimental results, we have a reason to believe that the new method can still achieve this even if more obstacles are present, as long as the relative sound volume stays at 30 decibels or more.
These comparisons highlight the advantages of the new method. The reason is that the new method carries users' information via sounds that can be heard by humans. In contrast, the approach in Ref. 1 embeds faint inaudible acoustic signals expressing users' information into an MCLT so the acoustic signals expressing users' information become background noise, which is covered by the MCLT. This is the fundamental difference between the method in Ref. 1 and the new one. This difference leads to the advantages of the new method.
Audio data transmission (ADT) is a method that sends a message signal through aerial space as a sound 6-9 . Mehrabi et al. found that ADT provides a rapid means of transferring data, in contrast to Bluetooth and imagebased QR methods, while requiring minimal physical effort and user coordination 8 . This is the advantage of ADT compared with Bluetooth and image-based QR methods. In fact, ADT is the basis of acoustic-based QR technique. Thus, acoustic-based QR methods have the same advantages compared to image-based QR methods. However, just as inventing an image sensor does not mean inventing an image-based QR technique, although an image-based QR code transmits data through an image sensor, proposing the ADT technique also does not mean proposing the acoustic-based QR technique, although an acoustic-based QR code transmits data via ADT. If ADT was discussed in Ref. [6][7][8][9] , this paper and Ref. 1 are talking about an acoustic-based QR technique. Table 6. The result of recognition when one machine plays MIDI files with a loudspeaker and another machine picks up the sound and tries to recognize it using Bideyuanli (the average decibels d1 measured at the pickup remain unchanged, the average decibels d2 measured at the loudspeaker change, and the background noise is d3 decibels) Let t1 = m/n if a URL has n characters and m characters are recognized correctly, as well as t2 = d1-d3 = 30. www.nature.com/scientificreports/ Table 7. The result of recognition when one machine plays MIDI files with a loudspeaker and another machine picks up the sound and tries to recognize it using Bideyuanli (the average decibels d1 measured at the loudspeaker remain unchanged, the average decibels d2 measured at the pickup change, and the background noise is d3 decibels) Let t1 = m/n, if a URL has n characters and m characters are recognized correctly, as well as t2 = d2-d3.

Value of t1 \ t2 (scenario 3) (The distance between the two machines is 5 m and there is one wall between them)
Value of t1 \ t2 (scenario 4) (The distance between the two machines is 10 m and there is two walls between them)   Table 6; (b) in Table 7. www.nature.com/scientificreports/ In addition, the experimental scenarios in Ref. 6 are similar to those in Ref. 1 , and no scenario was tested when the distance between the transmitter and receiver is more than one meter. In contrast, the new method can complete its task even if the distance grows tenfold, prompting the advantage of the new method again.
Chung proposed the effective short-distance transmission of advertisements for smart devices using high frequencies that are not audible to humans 10 . However, these high frequencies only form some trigger signals that enable a smart device to execute a process of advertisement transmission. The advertisement itself is transmitted via a wi-fi network rather than an acoustic channel. Thus, the means in Ref. 10 is an image-based QR code rather than an acoustics-based QR code, although the traditional former technique is developing in the direction of artistry and robustness 11 .
In short, a number of related works have occurred, and they are important and significant, whereas the proposed approach in this paper is different.
Comparison with image-based approaches. Currently, the image-based QR method is the popular QR technique, complementing the proposed technique.
First, let us consider security, as shown in Table 9. A scanning user does not know all the information of every black dot and white dot in a QR image. If the URL is tampered with by a hacker and some information in the black and white dots are altered, the user does not know this. Thus, a legal image-based QR code can be replaced covertly by a fake code. If the proposed method is used, what a user feels is music consisting of a string of pitches, not an image consisting of a large number of black dots and white dots. For the user, it is easy to realize that the music has been changed if a hacker replaces the real URL with a fake URL covertly. Which is easier to perceive, a piece of music is off-key, or a few dots are modified in a large number of black and white dots gathering together irregularly? The answer is obvious. That is why the new method is more effective in terms of combating tampering attacks.
Considering that a single block can store only one Mbyte at most and that some aesthetic QR images have several Mbytes, one can hardly expect the block-chain to help these aesthetic image-based QR codes combat tampering attacks. In contrast, an AAQRC MIDI file has only 1 Kbyte when a URL has one hundred characters. Thus, the block-chain will be useful in terms of dealing with tampering attacks if the proposed method rather than image-based QR methods is employed.
It is generally known that a QR image itself has little ability for a virus due to the number of black and white dots. However, it is difficult for a user to establish a one-to-one map between each of these dots and each of the characters in a URL, and they are not equal in number. That is, some dots do not carry any URL information. Thus, the following possibility cannot be ruled out: a hacker employs some "redundant" dots to carry malware code covertly. In contrast, it is absolutely impossible for a piece of AAQRC music to carry a virus because each character in a URL is mapped to a pitch in a string. That is, a user will find that the music becomes longer so that he or she will be aware of something abnormal if any virus information is embedded.
Second, robustness is also important. What's the principle of your QR generation? Convert a string into pitches, and produce an audio (Um, it is sort of complicated, and it is a long story…) What's the principle of your QR recognition? Read pitches from the audio, and convert them into a string (Um, it is sort of complicated, and it is a long story …) Can your QR work if the announcer and scanner are far away from each other?
Can your QR work if there are obstacles between the announcer and scanner? Yes (Hard to say) www.nature.com/scientificreports/ In short, the recognition effect of image-based QR code will be poor if the light is too weak, while the recognition effect of acoustic-based AAQRC code will be poor if there is too much noise. For example, a QR image cannot be recognized in an air-gapped way at an outdoor location without enough light at night, while AAQRC music is hard to recognize in an air-gapped way on a busy street.
Let us consider some extremely significant real-world scenarios as potential applications. Sometimes, you have to join a queue to scan a QR code and keep others at a distance before entering an indoor place. Such realworld scenes are very common in China's COVID-19 epidemic prevention and control, especially in a very large number of railway stations, hospitals, sites of very large-scale nucleic acid testing, and other public places all over the country. In this situation, how to assist people with security via QR conveniently, if you cannot expect a person to scan an image-based QR at night, in the rain, or under the blazing sun?
Of course, an image-based QR can also be used if a few black and white dots are blurred, whereas an AAQRC cannot be used if one pitch is inaccurate. The reason is that a QR image contains some redundant information, whereas no redundancy occurs in an AAQRC. Thus, this is an advantage rather than disadvantage of AAQRC. Furthermore, this problem does not need to be considered in many practical cases. For example, a source with unified authentication will easily eliminate any inaccurate pitch in the real-world scenarios mentioned above, which are relevant to COVID-19 epidemic prevention and control, in a potential application.
Third, let us think about artistry. Which will make users comfortable? An image-based QR, or the acoustic-based AAQRC? Ordinary QR codes present two colors: black and white. To improve the artistry of a QR, our lab put forward a sort of aesthetic-based  www.nature.com/scientificreports/ QR technique 11 , called "Meiyao" 12 , which has played an important role in the control of COVID-19 outbreaks in many cities in Henan Province, China 13 . In fact, Meiyao provides users not only a QR function but also a delightful user experience 11 , due to rich colors and beautiful images, without affecting the robustness. For the method proposed in this paper, we aim to enhance the user experience from the perspective of sound rather than vision. Which one is better? One man's meat is another man's poison! We performed a test. A poll on artistry and favorability among 100 students selected randomly at Zhoukou Normal University was made. To ensure fairness, the selected students were majoring in science and engineering, which had nothing to do with music, painting and art. Everyone evaluated Meiyao and AAQRC independently and, respectively, according to his or her own feelings, after using a given group of the prototype of Meiyao codes and prototype of AAQRC codes. Everybody has the following three mutually exclusive options: "I prefer this sort of QR code (Meiyao or AAQRC) to traditional QR codes based on black and white dots", "whatever this sort of QR code (Meiyao or AAQRC), or traditional QR codes based on black and white dots, I don't care", and "I dislike this sort of QR code (Meiyao or AAQRC)". Figure 9 illustrates the result of this poll. A few more persons prefer AAQRC over Meiyao as his or her favorite, although it is just a tiny gap, indicating that different strokes for different folks.
Fourth, accessibility is vital for users. There are two ways to access a QR image or AAQRC music: air-gap access and local access. On the AAQRC side, they are Mode 2 and Mode 1, respectively. In the former mode, a transmitter displays images or plays sounds, and the visual signals of the images and the acoustic signal of the sounds travel through the air before they are received by a receiver. In the latter mode, neither visual signals in terms of images nor acoustic signals in terms of sounds travel through the air, so the receiver only needs to recognize a QR image or AAQRC music on the local machine. Thus, we only need to consider the former way when we talk about accessibility. Table 10 provides some comparisons.
For example, on a campus or in a shopping mall, an AAQRC will be more suitable than an existing imagebased QR if a QR code needs to be put on the market in a large-scale and nondirectional way. The reason for this is that high-power loudspeakers are more common than very large screens at the real-world scenes of a campus or the indoor space of a shopping mall.
In terms of accessibility, speed needs special attention. In theory, AAQRC is slower than the image-based QR methods because listening to a piece of music expressing an AAQRC takes more time than scanning a traditional QR image. However, the reality may be somewhat different in many cases. We performed another test, as follows.
The 100 persons mentioned above lined up outside, waiting to enter an indoor space. Everyone needs to "scan" a QR code before entering the door. There are two optional "scanning" ways: one is to scan an image-based QR code, and the other is to use an AAQRC. Our test results show that 14 persons enter the door in one minute on average, using the former way. In contrast, 16 persons enter the door in one minute on average using the latter way. Clearly, an AAQRC is not slower than traditional image-based QR in this test. The reason is that even if you are further in the queue, you can hear the music expressing the AAQRC and can complete the process of AAQRC "scanning". In contrast, you must go to the front of the queue, i.e., wait for the queue to move until you arrive at the entrance of the room to complete the process of traditional QR scanning.
We take COVID-19 epidemic prevention and control as an example of a potential application. Supposing that a real-world scene with a queue is relevant to COVID-19 epidemic prevention and control, the fact mentioned in the previous paragraph can help us realize that speed is not an obstacle for an AAQRC in some vital realworld scenarios, compared to image-based QR methods. Of course, multi-play can disturb AAQRC recognition. However, any multi-play will be prohibited in such an extremely significant real-world scenario. As a result, this problem can be solved easily.
As analyzed above, the new method has some advantages and limitations compared with the image-based QR technique. In terms of shortcomings and limitations, AAQRC music is difficult to recognize in an air-gapped way in a busy street, as mentioned above. In addition, it will take a relatively long time to play an AAQRC once in some scenarios if the corresponding URL has too many characters.
In summary, what matters is a combination of security, robustness, artistry and accessibility. We can safely say that the image-based approaches and the newly proposed approach complement each other, according to the comprehensive analysis, tests and comparisons mentioned above. It should be noted that we do not think the new method is superior to the existing ones in terms of all the metrics. So what? It is not necessary to let the new method achieve this goal.
Some studies are relevant to sound, images and QR functions. For example, Sarkar et al. presented an interesting approach for tackling multiple QR codes all at once, and some multimedia data, including text, images, and audio data, can be converted to QR codes 17 . However, the generated QR objections waiting for scanning still exist in some PDF files or printed papers. Thus, this method is an image-based QR method, rather than an acoustical-based QR method.
More related works. Next, we will briefly survey a bigger picture or roadmap.
There were some early works 20,21 using audible acoustic signals for wireless communications. However, their ranges did not exceed 0.5 m, causing these methods to be considered near-field communication rather than QR codes. Furthermore, another method implements communication by embedding messages in audible audio 22 . However, the high frequency sound used is particularly sharp, and it lies beyond the scope of the frequencies of sound that people often hear in daily life. As a result, this method is a great one for short-range communications on some occasions, but it is not suitable for QR codes for daily use.
For an image-based QR, there have been many studies in recent years, including but not limited to the following. www.nature.com/scientificreports/ First, readability (robustness) is very important to a QR image. Deformation may reduce the readability of a QR image. To this end, Ref. 23 proposed a method to embed QR codes onto freeform surfaces using a lowend consumer-level 3D printer when deformation of QR images is caused by object surfaces that are not flat. Refs. 24,27 also introduced some methods to address issues related to deformation and readability. In addition, Ref. 31 proposed an algorithm for QR images, trying to address out-of-focus problems, which has an impact on QR readability.
Second, QR codes are closely related to some issues of information security, such as secret sharing via QR codes 25,35 , QR security in mobile payments 34 and QR detection against a malicious URL 26 .
As everyone knows, QR codes are often used to collect data, which may lead to the issue of data privacy in some cases. More broadly, how do we realize a good tradeoff between the availability of data and privacy preservation for data in several fields in course of data processing? Prof. Qi proposed some illuminating approaches [43][44][45] , providing great insights into the above question.
Third, some extended forms of QR codes have occurred, aiming to meet various real-world requirements, such as dual-modulated QR codes for proximal privacy and security 28 and "Meiyao" for QR artistic quality 11,12,30 . It should be noted that something interesting has happened. For example, black modules in standard QR codes can be replaced by specific texture patterns 32 , and a URL can be obtained by decoding a common picture that seems to have nothing to do with QR 33 . Furthermore, 3D 37 and 4D QR codes 36 have already been developed, although traditional QR codes are considered to be essentially 2D matrix images.
Fourth, QR images need to be presented on a microscopic scale 29 in some situations. A State of the art technique can inscribe a QR code composed of a set of 25 × 25 microdots, and each microdot has a diameter of approximately 14 µm 38 . In fact, a QR code can be integrated into a microdevice with a size of hundreds of microns 39 . In addition, a material method for micro QR codes has also been discussed 40 .
Fifth, the application of QR codes is always a research focus. To date, this technique has been applied to not only life but also various fields of science, such as optical retrieval 41 and taxonomy of species 42 .

Conclusions
Audible sound made by humans, except for natural language, such as an infant cry, can convey a certain message 18 . The newly proposed method carries and transfers URL information with a kind of artificial audible sound outside natural language, i.e., piano music. On the one hand, no QR image is generated. On the other hand, it is possible to "scan" such a QR sound remotely even if there are obstacles between the QR announcer (loudspeaker) and QR scanner (pickup). Both are benefits of using the new approach. Clearly, these characteristics establish that the new method is more practical than existing acoustic QR methods and complements existing image-based QR methods, implying the prospects for future applications of the new approach in practice.

Data availability
All data generated or analyzed during this study are included in this published article.